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BRIEF 


The  General  Classification  Test,  Arithmetic  Test,  Mechanical 
Test  and  Clerical  Test  of  the  Basic  Test  Battery,  which  are  ordi¬ 
narily  administered  to  each  recruit  during  his  fourth  day  of  re¬ 
cruit  training,  were  experimentally  readministered  several  months 
later  to  three  samples  of  men  immediately  prior  to  their  starting 
training  in  Electrician's  Mate,  Hospitalman,  and  Interior  Communi¬ 
cations  Electrician's  Class  "A"  schools.  Hie  purpose  of  the  ex¬ 
periment  was  to  determine  if  the  test  validities  would  be  comparable 
for  the  two  administrations,  since  a  considerable  economy  could  be 
effected  in  evaluating  new  tests  if  the  validities  were  found  to  be 
comparable . 

Since  four  tests  were  tried  at  each  of  three  schools,  there 
were  twelve  comparisons  between  the  early  (predictive)  and  later 
(concurrent)  validity  coefficients.  In  each  case  the  concurrent 
validity  coefficient  was  higher,  the  differences  ranging  from  .01 
to  .09.  It  was  concluded  that  in  order  for  a  test  to  be  considered 
for  possible  use  after  a  concurrent  tryout,  it  must  be  at  least  .05 
to  .09  more  valid  than  the  operationally  given  (predictive)  tests. 
Possible  explanations  for  these  findings  are  discussed. 
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COMPARISON  OF  PREDICTIVE  AND  CONCURRENT  VALIDITIES 
OF  BASIC  TEST  BATTERY  TEST  SCORES 


A.  BACKGROUND  AND  PURPOSE 

The  enlisted  classification  tests  of  the  Navy  Basic  Test  Battery 
(BTB)  are  administered  on  the  recruits'  fourth  day  of  recruit  train¬ 
ing.  In  evaluating  an  experimental  test  for  possible  inclusion  in 
the  BTB,  it  is  not  ordinarily  administered  at  a  comparable  time,  i.e., 
the  recruits'  fourth  day  in  the  Navy,  partly  because  this  would  make 
it  necessary  to  test  many  more  recruits  than  would  ultimately  attend 
the  schools  at  which  validation  would  take  place.  In  addition,  there 
would  be  at  least  an  eleven  week  delay  between  the  time  the  test  was 
taken  and  the  time  Class  "A"  school  training  was  begun.  Sometimes, 
therefore,  to  conserve  testing  time  and  to  accelerate  the  validation 
process,  experimental  classification  tests  are  administered  to  en¬ 
listed  men  at  the  time  they  enter  a  particular  Class  "A"  school,  and 
are  validated  against  final  grades  obtained  in  that  school. 

Although  a  saving  in  time  and  money  can  be  realized  through  ad¬ 
ministering  experimental  classification  tests  Just  prior  to  Class  "A" 
school  training,  there  is  on  the  other  hand  a  possibility  that  this 
practice  may  distort  the  validities  obtained.  It  is  possible,  for 
example,  that  the  validity  of  a  test  could  be  spuriously  elevated 
if  the  test  is  administered  to  men  Just  about  to  start  Class  "A" 
school  training.  The  experimental  test  would  in  this  case  appear  to 
be  more  valid  than  it  actually  was,  and  there  would  be  a  danger  of 
adopting  an  experimental  test  at  the  expense  of  an  operational  test 
which  was  actually  more  valid.  If,  on  the  other  hand,  the  validity 
of  a  test  administered  at  Class  "A"  school  were  lower  than  the  va¬ 
lidity  of  the  same  test  administered  prior  to  recruit  training,  the 
opposite  danger  would  exist.  In  order  to  make  maximally  effective 
use  of  validity  information  obtained  from  tests  administered  at 
Class  "A"  schools,  it  is  necessary  to  know  the  extent  and  direction 
of  any  bias  in  the  validities  so  obtained.  The  purpose  of  this 
study  was  to  investigate  the  effects  of  the  administration  of  ex¬ 
perimental  predictor  tests  at  two  different  points  of  time  on  the 
validity  of  the  tests. 


B.  PROCEDURE 


The  procedure  followed  in  this  study  was  to  compare  the  validi¬ 
ties  of  three  tests  in  the  BTB  administered  as  a  part  of  classifi¬ 
cation  testing  with  the  validities  of  the  seune  tests  administered 
upon  entry  into  Class  "A"  school. 
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1.  Predictor  Tests 


Tlie  following  tests  from  the  BTB  were  used  as  predictors; 

a.  The  General  Classification  Test  (GCT)  Is  a  100-item  tost  of 
verbal  aptitude  consisting  of  sentence  completion  and  verbal  analogy 
items  (^) .  A  single  Navy  Standard  Score  (NSS),  having  a  mean  of  50 
and  a  standard  deviation  of  10,  was  used. 

b.  The  Arithmetic  Test  (ARI)  consists  of  two  separately  timed 
subtests  (_^) . These  are  a  20-ltem  Arithmetic  Computation  subtest, 
which  provides  a  measure  of  speed  and  accuracy  in  performing  ele¬ 
mentary  computations,  and  a  30-item  Arithmetic  Reasoning  subtest, 
which  provides  a  measure  of  ability  to  solve  verbally  presented 
quantitative  problems.  A  total  score  in  NSS  form  was  used. 

c.  The  Mechanical  Test  (MECH)  consists  of  two  separately  timed 
50-item  subtests:  A  Mechanical  Comprehension  subtest,  and  a  Tool 
Knowledge  subtest  (8).  A  total  score  in  NSS  form  was  used. 

2.  Predictive  and  Concurrent  Administrations 


The  seime  forms  of  the  above  tests  were  administered  twice  to  all 
subjects:  Once  during  classification  testing  during  their  fourth 
day  of  training  and  once  just  prior  to  their  entry  into  Class  "A" 
school  training.  The  regular  administration  during  classification 
testing  will  be  referred  to  as  the  "predictive"  administration  of 
the  tests,  and  the  experimental  administration  just  prior  to  the 
beginning  of  Class  "A"  school  training  will  be  referred  to  as  the 
"concurrent"  administration  of  the  tests.  Similarly,  "predictive 
validities"  and  "predictive  means"  will  refer  to  the  means  and  va¬ 
lidities  of  the  BTB  tests  based  upon  the  first  administration  during 
classification  testing,  and  "concurrent  validities"  and  "concurrent 
means"  will  refer  to  the  validities  emd  means  of  the  tests  based 
upon  their  second  administration,  just  before  the  beginning  of  Class 
"A"  school  training. 

3.  Subjects 

The  subjects  for  the  present  study  comprised  267  Electrician's 
Mate  (EM),  336  Hospltalman  (HM)  and  266  Interior  Communications 
Electrician  (IC)  Class  "A"  school  non- fleet  trainees  who  entered 
school  from  September  through  November,  I961.  All  schools  were 
located  in  San  Diego. 

4.  Criterion  Data 


Final  school  grade  obtained  in  each  training  program  consti¬ 
tuted  the  criterion.  Academic  drops  were  not  Included  in  the 
analysis. 
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C.  STATISTICAL  ANALYSIS 


Means,  standard  deviations,  and  validities  against  final  school 
grade  were  obtained  for  both  administrations  of  the  GCT,  ARI  and 
MECH  for  each  school  sample.  Intercorrelations  among  the  predictors 
were  also  obtained.  Average  correlations  for  both  administrations  of 
GCT,  ARI  and  MECH  were  computed  using  Fisher's  r  to  z-transformation. 
The  significance  of  the  difference  between  the  predictive  and  con¬ 
current  validity  for  each  test  in  each  sample  was  obtained  using  a 
^-test  for  differences  between  correlated  correlations  (4,  p.  l48). 

The  significance  of  the  difference  between  the  predictive  and  con¬ 
current  validity  for  each  test,  averaged  over  all  three  samples,  was 
obtained  using  the  £-test  described  by  Winer  (^,  p.  44).  The  signifi¬ 
cance  of  the  difference  between  the  predictive  and  concurrent  means 
and  standard  deviations  for  each  test  in  each  sample  was  detennined 
using  the  tests  for  differences  between  correlated  measures. 


D.  RESULTS 


1.  Test  Validities 


The  validities  for  all  tests  are  presented  in  Thble  1.  It  can 
be  seen  that  the  predictive  validity  for  each  test  is  lower  than 
its  concurrent  validity  in  every  sample.  The  only  two  significant 
differences,  however,  are  for  GCT  and  ARI  in  the  HM  sample,  both 
of  which  are  significant  at  the  .05  level  (t  =  2.43  and  2.46  re¬ 
spectively)  . 


TABLE  1 

Comparison  of  Validity  Coefficients 


GCT 

ARI 

MECH 

School 

N 

Fred. 

Cone. 

Dlff. 

Fred. 

Cone. 

Dlff. 

Fred. 

Cone. 

Dlff. 

EM 

267 

.33 

.39 

.06 

.19 

.20 

.09 

.21 

.27 

.06 

HM 

336 

.21 

.27 

.06» 

.10 

.17 

.07* 

•05 

.06 

.01 

IC 

266 

.39 

.40 

.01 

.24 

.28 

.04 

.25 

.27 

.02 

Mean  Corre¬ 
lation 

.30 

.3^ 

.04* 

.17 

.24 

.07** 

.16 

.19 

•03 

Note. 


* 

The  difference  is  significant  at  the  .05  level. 

-N*# 

The  difference  is  significant  at  the  .01  level. 
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The  mean  differences  across  schools  between  the  predictive  and 
concurrent  validities  for  GCT>  ARI  and  MECH  are  also  given  in 
Table  1.  These  mean  differences  are  .04,  .07>  and  .03,  respectively. 
The  differences  between  the  mean  predictive  and  mean  concurrent  va¬ 
lidities  are  significant  at  the  .05  level  for  QCT  (z  =  2.56)  and  at 
the  .01  level  for  ARI  (z  -  3.09).  The  mean  difference  for  MECH  was 
not  significant  (z  =»  1.39)- 


Examination  of  the  validities  In  Table  1  reveals  that  the  magni¬ 
tude  of  the  difference  between  the  predictive  and  concurrent  validity 
is  independent  of  the  magnitudes  of  the  validities  involved,  and  does 
not  appear  to  be  related  to  whether  or  not  the  tests  in  question  were 
used  as  selectors  for  the  schools.^ 


2 .  Means 


The  means  and  standcird  deviations  for  all  tests  are  presented 
in  Table  2.  The  concurrent  mean  for  each  test  Is  higher  than  its 
predictive  mecm  In  every  sample.  The  mean  differences  are  all  sta¬ 
tistically  significant  at  the  .01  level  of  confidence. 

The  obtained  mean  differences  average  3*0  points  for  GOT,  1.7 
points  for  ARJ  and  3.1  points  for  MECH,  with  an  average  mean  differ¬ 
ence  of  2.6  points. 

3.  Standard  Deviations 


The  predictive  and  concvirrent  test  standard  deviations  are  given 
in  Table  2.  In  the  EM  sample,  the  predictive  standard  deviations 
are  significantly  larger  than  the  concurrent  standard  deviations  at 
the  .01  level  for  GCT,  ARI  and  MECH.  A  £-test  combining  all  three 
schools  shows  the  difference  between  predictive  and  concurrent 
standard  deviations  to  be  significant  at  the  .01  level  for  GCT  (z  = 
2.88),  at  the  .05  level  for  ARI  (z  =  2.4o),  and  not  significant  for 
MECH  (z  =  1.80). 

4.  Predictor  and  Criterion  Intercorrelations 


The  Intercorrelations  among  the  predictors  and  criteria  for  the 
EM,  HM  and  IC  samples  are  presented  in  Tables  3>  4  and  5#  respec¬ 
tively  in  the  appendix.  These  intercorrelations  were  used  in  com¬ 
puting  tests  of  signlflccuice,  and  are  Included  only  for  reference 
piirposes. 


^Selection  requirements  for  HM  school  were  a  minimum  total  of 
100  on  OCT  +  ARI.  For  EM  and  IC  schools,  ARI  +  MECH  must  equal  at 
least  105 j  or  at  least  100  and  a  minimtm  of  53  on  the  Electronic 
Technician  Selection  Test. 
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TABLE  2 


Means  and  Standard  Deviations  of  Test  Scores 


School  N 

GCT 

API 

MECH 

Pred. 

Cone. 

Diff. 

Pred. 

Cone. 

Diff. 

Pred. 

Cone. 

Diff. 

Means 

EM 

267 

56.65 

59-88 

3.23 

56.61 

58.29 

1.68 

35.^5 

58.77 

3.32 

HM 

336 

56.1+6 

59-37 

2.89 

53-97 

55.93 

1.96 

1+8. 1+6 

51.38 

2.92 

IC 

266 

56.86 

59-91 

3-05 

56.83 

58.29 

1 . 1+6 

55.66 

58.82 

3.16 

Average 

difference 

3-0 

1.7 

3.1 

Standard  Deviations 

EM 

267 

7.60 

6.60 

-1.00* 

6.59 

5.36 

-1.23* 

6.97 

6.10 

-.87* 

HM 

336 

7. '♦I 

7-3^* 

-.07 

6.71 

6.97 

.26 

7.53 

7.61 

.08 

IC 

266 

6.76 

6.59 

--17 

5.60 

5.37 

-.23 

6.09 

6.06 

-.03 

Notes. -- 

All  mean  differences  in  Table  2  are  significant  at  the  .01 
level. 

The  difference  between  the  designated  predictive  and  con¬ 
current  standard  deviations  is  significant  at  the  .01  level. 


E.  DISCUSSION 


In  the  present  samples,  differences  between  predictive  and  con¬ 
current  validities  ranged  from  .01  to  .09,  with  an  average  difference 
of  about  .05.  Consequently,  an  investigator  would  be  well  advised 
to  question  the  apparent  superiority  of  an  experimental  test,  even 
if  it  seems,  on  the  basis  of  concurrent  validation,  that  its  validity 
is  as  much  as  .09  higher  than  the  predictive  validity  of  its  opera¬ 
tional  counterpart. 

These  results  are  somewhat  different  from  the  results  obtained 
by  Setter  and  Frederlksen  {"]_)  and  Prederiksen  (l),  who  found  no  sig¬ 
nificant  difference  between  the  validity  of  a  test  administered 
during  recruit  classification  testing  and  the  validity  of  the  same 
test  administered  at  the  end  of  Class  "A"  school  training.  Their 
study  differed  from  the  present  study  in  a  number  of  respects, 
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however,  both  in  design  and  aim.  'Bie  Satter  and  Frederlksen  design 
used  a  cross-sectional  rather  than  the  longitudinal  design  employed 
in  the  present  study  (testing  two  groups  rather  than  testing  the 
same  group  twice).  Ihelr  design  raises  unresolved  problems  In  the 
matching  of  groups  with  regard  to  predictability.  Hie  interested 
reader  will  find  a  condensed  version  of  their  study  in  an  article 
appearing  In  Educational  and  Psychological  Measurement  (2). 

In  the  present  study,  the  same  tests  were  administered  twice  to 
the  same  individuals;  once  predictively,  and  once  concurrently. 

The  apparent  superiority  of  the  concurrently  administered  tests 
could  be  a  function,  in  part  at  least,  of  practice  effects.  These 
practice  effects  should  be  minimal,  however.  Inasmuch  as  three  or 
more  months  elapsed  between  test  administrations. 

Similarly,  practice  effects  could  contribute  to  the  obtained 
differences  in  score  distributions  between  predictive  and  concurrent 
administrations  of  the  tests.  Despite  the  effect  of  practice  upon 
the  mean  score,  the  data  suggest  that  the  increment  in  validity  of 
the  tests  is  not  entirely  attributable  to  practice  effects.  The 
data  reveal  that  ARI,  which  shows  the  greatest  difference  in  validity, 
is  the  test  which  shows  the  smallest  difference  in  mean  between  the 
two  administrations. 

An  alternative  explanation,  more  consistent  with  the  data,  is 
that  it  is  the  time  of  testing  which  is  important;  the  test  perform¬ 
ance  of  some  recruits  may  be  linduly  lowered  by  the  excitement  of  their 
first  days  in  the  Navy.  Presumably  the  later  testing  took  place  at 
a  time  when  these  recruits  were  more  relaxed  ani  had  attained  a  more 
stable  adjustment  to  the  Navy  environment.  In  support  of  this  expla¬ 
nation  it  is  noted  that,  in  general,  the  means  are  higher,  the 
standard  deviations  are  smaller,  emd  the  validities  are  higher  for 
the  concurrent  tests  than  for  the  predictive  tests.  These  combined 
factors  Indicate  that  there  was  a  greater  tendency  for  those  scoring 
low  on  the  first  testing  to  raise  their  scores  than  for  those  scoring 
high  on  the  first  testing  to  raise  their  scores.  An  eeirlier  study 
comparing  BTB  scores  for  third  and  ninth  day  testing  of  recruits 
attributed  the  higher  ninth  day  scores  to  acclimatization,  but  va¬ 
lidities  were  not  available  for  the  groups  tested  (3)- 

The  present  findings  tend  to  suggest  that  differences  between 
predictive  and  concurrent  validities  and  means  might  well  be  a  func¬ 
tion  of  the  type  of  test  being  used.  A  more  thorough  knowledge  is 
needed  of  the  degree  to  which  different  types  of  tests  are  affected 
by  differences  in  testing  times  and  conditions.  If  the  suggestion 
contained  in  the  data  is  confirmed  that  practice  effect  enhances 
validity,  fui’ther  research  seems  indicated  on  how  best  to  capitalize 
on  this  finding.  For  exeunple,  the  effects  of  Including  more  practice 
items  in  a  test  could  be  evaluated,  if  subsequent  reseeurch  were  to 
confirm  that  it  is  practice  which  confers  an  increase  in  validity. 

If,  on  the  other  hand,  it  is  determined  that  "adaptation"  or  "acclima¬ 
tization”  is  responsible  for  higher  validities,  consideration  should 
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be  given  to  the  possibility  of  testing  recruits  later  in  training 
than  is  done  at  present.  Further  analysis  of  the  data  gathered  in 
the  present  study  is  being  undertaken  to  evaluate  several  of  the 
above  possibilities. 


F.  CONCLUSIONS  AND  RECOMMENDATIONS 


Tests  given  immediately  prior  to  the  beginning  of  Class  "A" 
school  training  (concurrent  testing)  showed  a  gain  in  validity  of 
.01  to  .09  (average  =  .05)  over  the  same  tests  given  about  three 
months  earlier  (predictive  testing). 

While  the  present  design  does  not  permit  differentially  assess¬ 
ing  the  degree  to  which  differences  in  validity  may  be  due  to 
practice  effects  or  to  differences  in  time  of  test  administration, 
the  findings  suggest  that  a  conservative  attitude  is  desirable  in 
deciding  whether  or  not  to  replace  an  operational  test  with  an  ex¬ 
perimental  one.  More  specifically,  experimental  tests  validated 
concurrently  should  show  an  increment  in  validity  of  at  least  .05 
to  .09  over  operational  tests  before  they  are  considered  for 
adoption. 
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APPENDIX 


TABLE  3 

Predictor  and  Criterion  Intercorrelations 


For  the  EM  Class 
(N  = 

"A"  School 
26?) 

Sample 

12  3 

Concurrent 

U  5  6 

Predictive 

7 

CRITERION 

GCT  API  MECH 

GCT 

API 

MECH 

1. 

GCT 

.32  .2h 

.80 

.26 

.22 

.39 

2. 

API 

-.02 

.28 

.66 

-.03 

.28 

3. 

MECH 

.26 

-.02 

•  77 

.  2  ( 

L. 

OCT 

.k9 

•  39 

.  33 

5. 

API 

.23 

.19 

6. 

MECH 

.21 

7. 

CPITERION 

TABLE  4 

Predictor  and  Criterion  Intercoi 
For  the  HM  Class  "A"  School 
(N  =  336) 

relations 

S'unple 

12  3 

Concurrent 

456 

Predictive 

7 

CRITERION 

GCT  ARI 

MECH 

GCT 

ARI 

MECH 

1. 

GCT 

.51 

.23 

.89 

.1*6 

.22 

.53 

2. 

API 

.17 

.1*8 

.86 

.16 

.50 

3. 

MECH 

.20 

.16 

.87 

.12 

4. 

GCT 

.51 

.23 

.51 

5. 

ARI 

.17 

.1*1* 

6. 

MECH 

.10 

7. 

CRITEPION 

A1 
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TABLE  5 

Predictor  emd  Criterion  Intercorrelations 
For  the  IC  Class  "A"  School  Sample 


(N  - 

266) 

12  3 

Concurrent 

456 

Predictive 

7 

CRITERION 

OCT  ARI  iffiC3 

GCT 

ARI 

MECH 

1. 

OCT 

.33  .23 

.86 

.25 

.21 

.40 

2. 

ARI 

-.02 

.32 

.78 

-.03 

.26 

3. 

MECH 

.22 

-.10 

.82 

.27 

k. 

GCT 

.33 

.22 

.39 

5. 

ARI 

-.04 

.24 

6. 

MECH 

.25 

7. 

CRITERION 

A2 


